Special iSSue: MultiMedia SeManticS

نویسندگان

  • Farshad Fotouhi
  • William I. Grosky
  • Changbo Yang
  • Ming Dong
  • Angelos Hliaoutakis
  • Giannis Varelas
  • Epimenidis Voutsakis
  • Euripides G. M. Petrakis
چکیده

and MeSH terms (MeSH Headings). These descriptions are syntactically analyzed and reduced into separate vectors of MeSH terms which are matched against the queries according to Equation 3 (as similarity between expanded and re-weighted vectors). The weights of all MeSH terms are initialized to one while the weights of titles and abstracts are initialized by tf.idf. The similarity between a query and a document is computed as: Sim(q,d) = Sim(q,dMeSH–terms) + Sim(q,dtitle) + Sim(q,dabstract) (5) where dMeSH-terms, dtitle and dabstract are the representations of the document MeSH terms, title and abstract respectively. This formula suggests that a document is similar to a query if its components are similar to the query. Each similarity component can be computed either by VSM or by SSRM. For the evaluations, we applied the subset of 63 queries of the original query set developed by Hersh et al. (1994). The correct answers to these queries were compiled by the editors of OHSUMED and are also available on the Web along with the queries. A document is considered similar to a query if the query terms are included in the document. OHSUMED provides the means for comparing the performance of different methods. However, it is not particularly well suited for semantic information retrieval with SSRM. A better criterion would be to judge whether a document is on the topic of the query (even if it contains lexically different terms). The results in Figure 6 demonstrate that SSRM with expansion with very similar terms T=0.9 and for small answer sets (i.e., with less than eight answers) outperforms all other methods (Richardson et al., 1995; Salton, 1989; Voorhees, 1994). For larger answer sets, Voorhees (1994) is the best method. For answer sets with 50 documents all methods (except VSM) perform about the same. SSRM with expansion threshold T=0.5 performed worse than SSRM with T=0.9. An explanation may be that it introduced many new terms and not 68 Int’l Journal on Semantic Web & Information Systems, 2(3), 55-73, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. all of them are conceptually similar with the original query terms. Image Retrieval on the Web Searching for effective methods to retrieve information from the Web has been in the center of many research efforts during the last few years. The relevant technology evolved rapidly thanks to advances in Web systems technology (Arasu, Cho, Garcia-Molina, Paepke, & Raghavan, 2002) and information retrieval research (Yates et al., 1999). Image retrieval on the Web, in particular, is a very important problem in itself (Kherfi, Ziou, & Bernardi, 2004). The relevant technology has also evolved significantly propelled by advances in image database research (Smeulders, Worring, Santini, Gupta, & Jain, 2000). Image retrieval on the Web requires that content descriptions be extracted from Web pages and used to determine which Web pages contain images that satisfy the query selection criteria. Several approaches to the problem of content-based image retrieval on the Web have been proposed and some have been implemented on research prototypes, for example, ImageRover (Taycher, Cascia, & Sclaroff, 1997), WebSEEK (Smith & Chang, 1997), Diogenis (Aslandongan & Yu, 2000), and commercial systems, Google Image Search12, Yahoo13, and Altavista14. Because, methods for extracting reliable and meaningful image content from Web pages by automated image analysis are not yet available images on the Web are typically described by text or attributes associated with images in html tags (e.g., filename, caption, alternate text etc.). These are automatically extracted from the Web pages and are used in retrievals. Google, Yahoo, and AltaVista are example systems of this category. We choose the problem of image retrieval based on surrounding text as a case study for this evaluation. SSRM has been evaluated through IntelliSearch15, a prototype Web retrieval system for Web pages and images in Web pages. An earlier system we built supported retrievals using only VSM (Voutsakis, Petrakis, & Milios, 2005). In this article, the system has been extended to support retrievals using SSRM with WordNet as the underlying reference ontology. Figure 6. Precision-recall diagram for retrievals on OHSUMED using MeSH Int’l Journal on Semantic Web & Information Systems, 2(3), 55-73, July-September 2006 69 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. The retrieval system of IntelliSearch is built upon Lucene and the database stores more than 1.5 million Web pages with images. As it is typical in the literature (Petrakis, Kontis, Voutakis, & Milios, 2005; Shen, Ooi, & Tan, 2000; Voutsakis et al., 2005), the problem of image retrieval on the Web is treated as one of text retrieval as follows: Images are described by the text surrounding them in the Web pages (i.e., captions, alternate text, image file names, page title). These descriptions are syntactically analyzed and reduced into term vectors, which are matched against the queries. Similarly, to the previous experiment, the similarity between a query and a document (image) is computed as: Sim(q,d) = Sim(q,dimage–file–name) + Sim(q,dcaption) + Sim(q,dpage–title) + Sim(q,dalternate–text) (6) For the evaluations, 20 queries were selected from the list of the most frequent Google image queries. These are short queries containing between 1 and 4 terms. The evaluation is based on human relevance judgments by five human referees. Each referee evaluated a subset of four queries for both methods. Figure 7 indicates that SSRM is far more effective than VSM achieving up to 30% better precision and up to 20% better recall. A closer look into the results reveals that the efficiency of SSRM is mostly due to the contribution of non-identical but semantically similar terms. VSM (like most classical retrieval models relying on lexical term matching) ignore this information. In VSM, query terms may also be expanded with synonyms. Experiments with and without expansion by synonyms are presented. Notice that VSM with query expansion by synonyms improved the results of plain VSM only marginally, indicating that the performance gain of SSRM is not due to the expansion by synonyms but rather due to the contribution of semantically similar terms. CONClUSION This article makes two contributions. The first contribution is to experiment with several semantic similarity methods for computing Figure 7. Precision-recall diagram for retrievals on the Web using WordNet 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 p re ci si on recall SSRM (T=0.9) VSM with Query Expansion VSM 70 Int’l Journal on Semantic Web & Information Systems, 2(3), 55-73, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. the conceptual similarity between natural language terms using WordNet and MeSH. To our knowledge, similar experiments with MeSH have not been reported elsewhere. The experimental results indicate that it is possible for these methods to approximate algorithmically the human notion of similarity reaching correlation (with human judgment of similarity) up to 83% for WordNet and up to 74% for MeSH. The second contribution is SSRM, information retrieval method that takes advantage of this result. SSRM outperforms VSM, the classic information retrieval method and demonstrates promising performance improvements over other semantic information retrieval methods in retrieval on OHSUMED, a standard TREC collection with medical documents, which is available on the Web. Additional experiments have demonstrated the utility of SSRM in Web image retrieval based on text image descriptions extracted automatically. SSRM has been also tested on Medline16, the premier bibliographic database of the U.S. National Library of Medicine (NLM) (Hliaoutakis et al., 2006). All experiments confirmed the promise of SSRM over classic retrieval models. SSRM can work in conjunction with any taxonomic ontology like MeSH or WordNet and any associated document corpus. Current research is directed towards extending SSRM to work with compound terms (phrases), and more term relationships (in addition to the Is-A relationships). ACKNOWlEDGMENTDr Qiufen Qi of Dalhousie Universityfor prepared the MeSH terms and the queriesfor the experiments with MeSH and evaluatedthe results of retrievals on Medline. We thankNikos Hurdakis, and Paraskevi Raftopoulou forvaluable contributions into this article. The U.S.National Library of Medicine provided us withthe complete data sets of MeSH and Medline.This article was funded by project MedSearch/BIOPATTERN (Fp6, Project No 508803) of theEuropean Union (EU), the Natural Sciences andEngineering Research Council of Canada, andIT Interactive Services Inc.REFERENCES Arasu, A., Cho, J., Garcia-Molina, H., Paepke, A., & Raghavan, S. (2002). Searchingthe Web. ACM Transactions on InternetTechnology, 1(1), 2-43.Aslandongan, Y. A., & Yu, C. T. (2000).Evaluating strategies and systems forcontent-based indexing of person im-ages on the Web. In Proceedings of theInternational Conference on Multimedia(pp. 313-321).Attar, R., & Fraenkel, A. S. (1977). Local feed-back in full text retrieval systems. Journalof the ACM, 23(3), 397-417.Collins-Thomson, K., & Callan, J. (2005).Query expansion using random walkmodels. In Proceedings of CIKM (pp.704-711).Hersh, W. R., Buckley, C., Leone, T. J., &Hickam, D. H. (1994). OHSUMED: Aninteractive retrieval evaluation and newlarge. In Proceedings of ACM SIGIR (pp.192-201).Hliaoutakis, A., Varelas, G., Petrakis, E. G.M., & Milios, E. (2006). MedSearch: Aretrieval system for medical informationbased on semantic similarity. In Proceed-ings of ECDL (pp. 512-515).Jiang, J. J., & Conrath, D. W. (1998). Semanticsimilarity based on corpus statistics andlexical taxonomy. In Proceedings of theInternational Conference on Research inComputational Linguistics.Kherfi, M. L., Ziou, D., & Bernardi, A. (2004).Image retrieval from the World Wide Web:Issues, techniques, and systems. ACMComputing Surveys, 36(1), 35-67.Leacok, C., & Chodorov, M. (1998). Combininglocal context and WordNet similarity forword sense identification. In C. Fellbaum(Ed.), WordNet: An electronic lexicaldatabase and some of its applications(chap. 11). Boston: MIT Press.Li, Y., Bandar, Z. A., & McLean, D. (2003).An approach for measuring semanticsimilarity between words using multipleinformation sources. IEEE Transactionson Knowledge and Data Engineering, Int’l Journal on Semantic Web & Information Systems, 2(3), 55-73, July-September 2006 71 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.15(4), 871-882. Lin, D. (1993). Principle-based parsing withoutovergeneration. In ACL (pp. 112-120). Liu, S., Liu, F., Yu, C., & Meng, M. (2004). An effective approach to document retrievalvia utilizing wordnet and recognizingphrases. In Proceedings of ACM SIGIR(pp. 266-272).Lord, P. W., Stevens, R. D., Brass, A., & Goble,C. A. (2003). Investigating semanticsimilarity measures across the gene ontol-ogy: The relationship between sequenceand annotation. Bioninformatics, 19(10),1275-1283.Mandala, R., Takenobu, T., & Hozumi, T.(1998). The use of WordNet in infor-mation retrieval. In COLING/ACL (pp.469-477).Mihalcea, R., Corley, C., & Strapparava, C.(2006, July). Corpus-based and knowl-edge-based measures of text semanticsimilarity. In Proceedings of the AmericanAssociation for Artificial Intelligence(AAAI 2006). Boston.Miller, G., & Charles, W. (1991). Contextualcorrelates of semantic similarity. Lan-guage and Cognitive Processes, 6(1),1-28.Patwardhan, S., Banerjee, S., & Petersen, T.(2003, February). Using measures ofsemantic relatedness for word sensedisambiguation. In Proceedings of theFourth International Conference on Intel-ligent Text Processing and ComputationalLinguistics, Mexico City, Mexico (pp.241-257).Petrakis, E., Kontis, K., Voutakis, E., & Milios,E. (2005). Relevance feedback methodsfor logo and trademark image retrieval onthe Web. In Proceedings of ACM SAC,IAR (pp. 23-27).Possas, B., Ziviani, N., Meira, W., & Neto, B.R. (2005). Set-based vector model: Anefficient approach for correlation-basedranking. ACM Transactions on Informa-tion Systems, 23(4), 397-429.Qiu, Y., & Frei, H. P. (1993). Concept-basedquery expansion. In Proceedings of SIGIR(pp. 160-169). Rada, R., Mili, E., Bicknell, E., & Blettner, M. (1989). Development and application of ametric on semantic nets. IEEE Transac-tion on Systems, Man, and Cybernetics,19(1), 17-30.Resnik, O. (1999). Semantic similarity in ataxonomy: An information-based mea-sure and its application to problems ofambiguity and natural language. Journalof Artificial Intelligence Research, 11,95-130.Richardson, R., & Smeaton, A. (1995). UsingWordNet in a knowledge-based approachto information retrieval (Working Paper:CA-0395). Dublin, Ireland: School ofComputer Applications.Richardson, R., Smeaton, A., & Murphy, J.(1994). Using WordNet as a knowledgebase for measuring semantic similar-ity between words (Working paper CA-1294). Dublin, Ireland: School ofComputer Applications, Dublin CityUniversity.Rochio, J. J. (1971). Relevance feedback ininformation retrieval. In G. Salton (Ed.),The SMART retrieval system—Experi-ments in automatic document processing(pp. 313-323). Englewood Cliffs, NJ:Prentice Hall.Rodriguez, M. A., & Egenhofer, M. J. (2003).Determining semantic similarity amongentity classes from different ontologies.IEEE Trans. on Knowledge and DataEngineering, 15(2), 442-456.Rui, Y., Huang , T. S., Ortega, M., & Mechrota,S. (1998). Relevance feedback: A powertool for interactive content-based imageretrieval. IEEE Transaction on Circ.and Syst. for Video Technology, 8(5),644-655.Salton, G. (1989). Automatic text processing:the transformation analysis and retrievalof information by computer. Reading,MA: Addison-Wesley.Seco, N., Veale, T., & Hayes, J. (2004). Anintrinsic information content metric forsemantic similarity in WordNet. Ireland: 72 Int’l Journal on Semantic Web & Information Systems, 2(3), 55-73, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc.is prohibited.Dept. of Computer Science, University

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Semantics and Selection of TAM in Bantu Languages: A Case of Semantic Classification of Kiswahili Verbs

The existing literature on Bantu verbal semantics demonstrated that inherent semantic content of verbs pairs directly with the selection of tense, aspect and modality formatives in Bantu languages like Chasu, Lucazi, Lusamia, and Shiyeyi. Thus, the gist of this paper is the articulation of semantic classification of verbs in Kiswahili based on the selection of TAM types. This is because the sem...

متن کامل

Landscaping Future Interaction: Special issue on Mobile and Ubiquitous Multimedia

B ecause multimedia is becoming ubiquitous, we will soon be able to count on access to any multimedia content, from anywhere in the world. This special issue of IEEE MultiMedia addresses the most recent developments in this area and looks at the current technologies enabling mobile and ubiquitous multimedia. In particular, this special issue presents articles that discuss novel and future-orien...

متن کامل

A Time Model for Distributed Multimedia Applications

The significant resource requirements of distributed realtime multimedia applications often push today’s system platforms to their limits. As a consequence, efficient, economic and adaptive management of resources is a major issue in distributed multimedia systems. This paper outlines a model-based time service encompassing three different aspects of time that are fundamental for sophisticated ...

متن کامل

Emerging Multimedia Research and Applications

ISM 2014 For over a decade, ISM has been an internationally renowned forum for researchers and practitioners to develop solutions and exchange ideas in emerging multimedia research and applications. In January 2015, the authors of the top ISM 2014 papers were invited to submit extended versions of their papers (with at least 30 percent new material) to this special issue. After a rigorous peer-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006